Improving knowledge distillation using unified ensembles of specialized teachers
نویسندگان
چکیده
The increasing complexity of deep learning models led to the development Knowledge Distillation (KD) approaches that enable us transfer knowledge between a very large network, called teacher and smaller faster one, student. However, as recent evidence suggests, using powerful teachers often negatively impacts effectiveness distillation process. In this paper, reasons behind apparent limitation are studied an approach transfers more efficiently is proposed. To end, multiple highly specialized employed, each one for small set skills, overcoming aforementioned limitation, while also achieving high efficiency by diversifying ensemble. At same time, employed ensemble formulated in unified structure, making it possible simultaneously train models. proposed method demonstrated three different image datasets, leading improved performance, even when compared with state-of-the-art ensemble-based methods.
منابع مشابه
Efficient Knowledge Distillation from an Ensemble of Teachers
This paper describes the effectiveness of knowledge distillation using teacher student training for building accurate and compact neural networks. We show that with knowledge distillation, information from multiple acoustic models like very deep VGG networks and Long Short-Term Memory (LSTM) models can be used to train standard convolutional neural network (CNN) acoustic models for a variety of...
متن کاملKnowledge distillation using unlabeled mismatched images
Current approaches for Knowledge Distillation (KD) either directly use training data or sample from the training data distribution. In this paper, we demonstrate effectiveness of ’mismatched’ unlabeled stimulus to perform KD for image classification networks. For illustration, we consider scenarios where this is a complete absence of training data, or mismatched stimulus has to be used for augm...
متن کاملSequence-Level Knowledge Distillation
Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...
متن کاملImproving the distillation energy network
E nergy costs are the largest percentage of a hydrocarbon plant’s operating expenditures. This is especially true of the distillation process, which requires substantial energy consumption. Concerns over recent high costs and economic pressures continually emphasise the need for efficient distillation design and operation without a loss of performance. This article illustrates how energy-effici...
متن کاملApprentice: Using Knowledge Distillation Techniques to Improve Low-precision Net-
Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems — the models (often deep networks or wide net...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition Letters
سال: 2021
ISSN: ['1872-7344', '0167-8655']
DOI: https://doi.org/10.1016/j.patrec.2021.03.014